A geographical statistical approach to the 2018 New Zealand Index of Multiple Deprevation

GEOG 351: Quantitative Analysis

Alice Clauss (260945446)

Stages

  • Background

  • Methods

  • Results

  • Conclusions

Background

Background: Data

My project focuses on the New Zealand Index of Multiple Deprivation (IMD), which was created in 2013 and updated in 2018, which is a ranking of deprivation by data zones in New Zealand (Exeter et al. 2018).

I also used data from the 2018 New Zealand Census at the statistical area two level, focusing on education qualifications, typical methods of travel to work, and Maori heritage (Stats NZ 2020).

Background: Index weighting

imd18 <- readxl::read_excel("IMD2018.xlsx", sheet = "IMD18") # Import data
ncol(imd18) # Number of columns
[1] 41
nrow(imd18) # Number of entries
[1] 6181
  • 6180 entries (excluding the column names’ row)
  • these are at the different geographic level than census data
  • only 15 real variables, with duplicates

Figure 1. Weights of the IMD (Exeter et al. 2018).

Background: Census wrangling

# Necessary for importing data nicely
library(tidyverse)

# Importing and cleaning education data
education <- read_csv("2018-census-place-summaries-csv/2018-census-place-summaries-education-table2-2018-csv.csv") %>%
  filter(Area_type == "Statistical Area 2")
education <- education %>%
  filter(Maori_ethnic_group_indicator_summary_description == "Total") %>%
  select(Year, Area_type, Area_code, Highest_qualification_description, Highest_qualification_percent) %>%
  pivot_wider(names_from = Highest_qualification_description, values_from = Highest_qualification_percent) %>%
  select(!c(`Not elsewhere included`, Total))

# Importing and cleaning ethnicity data
ethnicity <- read_csv("2018-census-place-summaries-csv/2018-census-place-summaries-ethnicity-table1-2018-csv.csv") %>%
  filter(Area_type == "Statistical Area 2")
ethnicity <- ethnicity %>%
  select(Year, Area_type, Area_code, Maori_descent_description, Maori_descent_indicator_percent) %>%
  pivot_wider(names_from = Maori_descent_description, values_from = Maori_descent_indicator_percent) %>%
  select(!c(`Response unidentifiable`, `Not stated`, `Total`))

# Cleaning transport data, spitting out the percentage of modes used for work, selecting SA2
transport <- read_csv("2018-census-place-summaries-csv/2018-census-place-summaries-transport-table1-2018-csv.csv")
transport <- transport %>%
  select(Year, Area_type, Area_code, `Main_means_of_travel_to_work_description`, `Main_means_of_travel_to_work_percent`) %>%
  pivot_wider(names_from = `Main_means_of_travel_to_work_description`, values_from = `Main_means_of_travel_to_work_percent`) %>%
  filter(Area_type == "Statistical Area 2") %>%
  select(!c(`Did not go to work today`, `Not elsewhere included`))

# Binding the census data together
vars <- left_join(education, ethnicity, by = "Area_code") %>%
  left_join(., transport, by = "Area_code")
vars <- vars %>%
  select(!c(Year, Year.y, Area_type.y, Area_type.x, Area_type)) %>%
  rename(
    `Total Education` = `Total stated.x`,
    `Year` = Year.x,
    `Total Ethnicity` = `Total stated.y`,
    `Total Transport` = `Total stated`
  )

# Creating groupings for education
vars <- vars %>%
  mutate(
    Secondary = as.numeric(`Level 1 certificate`) + as.numeric(`Level 2 certificate`) + as.numeric(`Level 3 certificate`) + as.numeric(`Overseas secondary school qualification`),
    `Some University` = as.numeric(`Level 4 certificate`) + as.numeric(`Level 5 diploma`) + as.numeric(`Level 6 diploma`),
    Tertiary = as.numeric(`Bachelor's degree and level 7 qualification`) + as.numeric(`Post-graduate and honours degrees`),
    `Post-tertiary` = as.numeric(`Master's degree`) + as.numeric(`Doctorate degree`),
    `Any University` = as.numeric(`Level 4 certificate`) + as.numeric(`Level 5 diploma`) + as.numeric(`Bachelor's degree and level 7 qualification`) + as.numeric(`Post-graduate and honours degrees`) + as.numeric(`Master's degree`) + as.numeric(`Doctorate degree`)
    ) %>%
  select(!c(`Level 1 certificate`, `Level 2 certificate`, `Level 3 certificate`,
            `Level 4 certificate`, `Level 5 diploma`, `Level 6 diploma`,
            `Bachelor's degree and level 7 qualification`, `Post-graduate and honours degrees`,
            `Master's degree`, `Doctorate degree`))

Background: Census wrangling

And this is where a lot of problems that have been building come to a head.

2013 Census Areas \(=\) 2013 IMD Data zones

2018 Statistical Area 2 \(\neq\) 2018 IMD Data zones

2018 Statistical Area 2 \(<\) 2018 IMD Data zones

Background: Census wrangling

  • Imported SA2 Boundaries, census data, and IMD18 data to ArcGIS Pro
  • Education, Ethnicity, and Transport data was joined to the SA2 Boundaries
  • Using a spatial join (greatest overlap), census data was joined to the IMD18 data.

Methods

Methods: Ordinal data

The IMD is a form of ordinal level data. Each data zone is ranked 1:6181, with a rank of 1 indicating the lowest deprivation and 6181 the highest. Thus, to investigate the data I primarily used methods that accepted ordinal data but did use it in a linear regression.

Methods: Hypotheses

  • There will be greater deprivation with less education qualifications

    • And the inverse, that there will be less deprivation with more education qualifications
  • There will be less deprivation with greater private car and bicycle use

  • There will be more deprivation with greater bus use

  • There will be greater deprivation in areas of greater Maori ethnicity

Methods: Variables

To avoid an inherent correlation between the predictors and the response variable, I chose to use the Index of Multiple Deprivation that excludes education data.

Response

RnkIMDNoEdu: Ranked index of multiple deprivation that was made without education data.

Predictors

No_qual: % of people with no proof of education

AnyUni: % of people who have any university education

Master_: % of people with post-bachelor’s education

Maori: % of people with Maori heritage

Drive_priv: % of people who drive their own vehicle to work

Bus: % of people who take the bus to work

Bike: % of people who bike to work

Methods: Normality

Methods: Transformations

Methods: Analysis

  • Spearman Correlation

    • ordinal test for correlation within data, ranging from 1 (positively correlated) to -1 (negatively correlated)
  • Moran’s I

    • Test for spatial correlation of the data (run on the response variable)
  • Linear Regression

    • Requires independence of variables, normality of residuals, and homoscedasticity of residuals,.

    • Spatial Lag Regression with Queen weighting

    • Spatial Error Regression with Queen weighting

Results

Results: Spearman Correlation

Pearson correlation coefficients, which range from \(1:-1\).

           R N lg_m D lg_bs M A lg_bk
RnkIMDNoEd 1                         
No_qual    . 1                       
log_maori  , , 1                     
Drive_priv . .      1                
log_bus      . .      1              
Masters_   . + ,    . .     1        
AnyUni     . * ,    . .     * 1      
log_bike                        1    
attr(,"legend")
[1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1

The significance level of each Spearman correlation is notated at the bottom.

Methods: Global Moran’s I

Methods: Local Moran’s I

Results: Multiple Linear Regression

Results: Multiple Linear Regression

Results: Assumptions of Multiple Linear Regression

Normality of residuals

Histogram of residuals, right skewed

Heteroscedasticity

Results of Breusch-Pagan test reject the null hypothesis of homoscedasticity with a p-value of 0, confirming that the residuals are heteroscedastic.

No collinearity

With Geoda’s multicollinearity condition number of 33.22, there may be some collinearity between the predictors, though it improved significantly after the removal of Master_.

Results: Spatial Lag Linear Regression

Histogram of residuals

Results: Spatial Error Linear Regression

Histogram of residuals

Conclusions

Conclusions

Ultimately, we can conclude that bike riding, Maori ethnicity, lack of educational qualifications, bus ridership, and car driving function well as predictors of depravation, particularly when accounting for the spatial differences in the data.

Works cited

Exeter, Daniel John, Arier Chi Lun Lee, Jinfeng Zhao, Sue Crengle, Annie Chiang, and Michael Browne. 2018. “2018 Index of Multiple Deprivation.” https://hgd.auckland.ac.nz/imd18/.
Stats NZ. 2020. “2018 Census Place Summaries.” https://www.stats.govt.nz/tools/2018-census-place-summaries/.